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Abstract. This article is concerned with the application of the program extraction tech¬ 
nique to a new class of problems: the synthesis of decision procedures for the classical satis¬ 
fiability problem that are correct by construction. To this end, we formalize a completeness 
proof for the DPLL proof system and extract a SAT solver from it. When applied to a 
propositional formula in conjunctive normal form the program produces either a satisfying 
assignment or a DPLL derivation showing its unsatisfiability. We use non-computational 
quantifiers to remove redundant computational content from the extracted program and 
translate it into Haskell to improve performance. We also prove the equivalence between 
the resolution proof system and the DPLL proof system with a bound on the size of the 
resulting resolution proof. This demonstrates that it is possible to capture quantitative 
information about the extracted program on the proof level. The formalization is carried 
out in the interactive proof assistant Minlog. 


1. Introduction 

In order for verification tools to be used in an industrial context they have to be trusted to 
a high degree and in many cases are required to be certihed. We present a new application 
of program extraction to develop a formally verified decision procedure for the satisfiability 
problem for propositional formulae in conjunctive normal form. The procedure is based on 
the DPLL proof system [HI US] which is also the basis of most contemporary SAT solvers 
that are used in an industrial context. 

The need for verified SAT solvers is obvious; they are part of safety critical software, 
and also used for the verification and certification thereof. SAT solvers are nowadays highly 
optimized for speed, which makes the introduction of errors (in the process of optimiza¬ 
tion) more likely, and their verification more difficult. Besides the correctness also totality 
(or universality) of SAT solvers is an issue. For example, in the 2012 SAT competition 
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(www. smtcomp. org) many systems were not total in the sense that they returned “Unknown” 
for certain inputs signifying that they could not deal with the given problem. 

In this paper we report about the extraction of a SAT solver that is both correct and 
total by construction. In addition, it produces in the unsatisfiable case a formal proof of 
this fact, which is recognized in the SAT community as a highly desirable feature of SAT 
solvers. To be more precise, we formalize a correctness and completeness proof of the DPLL 
proof system in the interactive theorem prover Minlog, and use Mining’s program extraction 
facilities to obtain a formally verified SAT solving algorithm. When run on a CNF formula 
it produces a model satisfying the formula or a DPLL derivation showing its unsatisfiability. 
We also prove the equivalence of DPLL and resolution and extract a program translating 
DPLL proofs into resolution proofs of smaller or equal size. 

Minlog [3I1121I1] is an interactive proof assistant based on a first-order natural deduction 
calculus. It implements various methods of program extraction such as realizability [23] 
(which can be viewed as a technical rendering of the Curry-Howard correspondence [15ll20j ) 
and the Dialectica interpretation. It also extends program extraction to classical proofs 
via the Friedman/Dragalin A-translation. All these techniques are refined and optimized 
in order to improve usability and to obtain simpler programs. In addition to extracting 
a program from a proof, Minlog also automatically extracts a proof that the program 
meets its specification; see for instance [l2| for an overview on program extraction and its 
underlying theory. A number of substantial case studies on program extraction have been 
carried out reaching from the extraction of a normalization-by-evaluation algorithm [3] to 
the extraction of programs in constructive analysis m- Recent developments concentrate 
on program extraction for induction and coinduction, including applications in the context 
of exact real number computation [5]. 

An optimization in Minlog that is particularly important for this paper is the use 
of so-called non-computational quantifiers, which flag certain information in the proof as 
computationally irrelevant, and therefore allow for the removal of computational redundancy 
in the extracted program. In case of the extracted SAT solver, this leads to a significant 
improvement. 

We also applied an automatic translation of Minlog terms into Haskell code to the 
extracted program and observed a further dramatic improvement of performance. We eval¬ 
uate the performance of our extracted solver by comparing it 1) with another verified SAT 
solver, Versat [36], using Pigeon hole formulae and 2) with an industrial tool, SCADE [T], 
by means of an example from the railway domain. 

An earlier version of this article, containing partial results, was reported at the 
MFPS 2012 |25| conference. 

1.1. Related Work. There are several other systems supporting program extraction from 
proofs for the purpose of producing formally verified programs. An early example is the 
Nuprl system [13j : other mature interactive theorem provers that implement program ex¬ 
traction are Coq |8|, which is based on the Calculus of Inductive Constructions, and Isabelle 
[35| . a generic theorem prover with extensions for many logics (see [7| for code generation 
and |6| for program extraction from proofs in Isabelle). More recently, other interactive 
theorem provers based on dependent types [30], such as Agda [H] and Idris m, have 
emerged which realize the Curry-Howard correspondence and therefore can also be viewed 
as supporting program extraction. 
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The Coq system has been used in several approaches to formalize automatic theorem 
proving. Lescuyer and Conchon [26] program a SAT solver based on the DPLL algorithm 
as a recursive function in Coq, and verify its soundness and completeness formally in the 
system. The solver is then instantiated on the propositional fragment of Coq’s logic, cre¬ 
ating a user friendly proof tactic. Similarly, Verma et al. |l6] formalize Binary Decision 
Diagrams in Coq, prove their correctness, and extract certihed BDD algorithms in OCaml. 
The main reason for their formalization was to integrate symbolic model checking in Coq. 
Significant work has also been performed in Isabelle with several decision procedures verihed 
and integrated into the system. The DPLL algorithm has been formalized by Marie and 
Janicic |28|. This approach was extended to formalize a SAT solver including optimizations 
such as clause learning and the lazy two-watched-literal data structure m- The authors 
investigated automatic code generation, but in the end the verihed algorithm was manually 
translated into C code. The automatic theorem prover Metis m is used inside Isabelle to 
reconstruct proofs from faster external procedures such as the ones used in Sledgehammer 
m- A different direction to deal with the correctness of SAT solvers has been to verify a 
proof checker for resolution proofs [H] . This will check and guarantee that the output from 
a solver for a particular SAT problem is correct. 

The DPLL solver Versat [36|, mentioned earlier, was formalized and verihed in the 
dependently typed programming language Guru |l5] and then translated into imperative C 
code. This translation is possible because Guru contains mutable arrays. Since Guru allows 
for the veriheation of low level optimizations involving such arrays and Versat implements 
clause learning, the resulting solver is quite efficient. However, this approach differs from 
ours in that only soundness has been proven for Versat, whilst we have the possibility 
to deliver a proof in the case of unsatishability. This means that while every satishable 
assignment produced by Versat can be trusted, it is not guaranteed that Versat can solve 
every solvable problem. 

A program extraction project related to ours was carried out by Weich m who gave 
two constructive proofs of the decidability of intuitionistic propositional logic and extracted 
two different programs that, for a given formula, either produce a derivation in intuitionistic 
sequent calculus, or a Kripke counter-model. The second proof and program extraction were 
formalized in Minlog for the implicational fragment. 

The articles |26l [28] verifying a DPLL SAT solver (in both Goq and Isabelle) were the 
main motivation for our work. Their approaches involve a formalization of the algorithm 
to be verified. In contrast, we work in a system that does not require any formalization 
of algorithms. It is enough to prove that each GNF-formula is either unsatishable or has 
a model. The desired SAT solving algorithm and its correctness proof are then extracted 
fully automatically. 


2. Preliminaries 

We begin with some basic dehnitions, following |26| 128] . 

V, i.e. a variable v 
+v. 


Definition 2.1. 

(1) A literal I is either a positive variable +v or a negative variable — 
with a label -|- or — attached. 

(2) For every literal I we define the opposite literal I by +v = —v, —v = 
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(3) We set Var(+u) = Var(—u) = v, Var(L) = {Var(/) | / € L} for a set of literals L, and 
Var(A) = |J{Var(L) | L G A} for a set of sets of literals A. 

(4) A clause C is a finite set of literals to be viewed as their disjunction. 

(5) A formula in conjunctive normal form (CNF) is a finite conjunction of clauses. By a 
formula A we will always mean a formula in CNF, and we will identify it with a finite 
set of clauses {Ci,..., (7^}, representing the conjunction of the Cj. 

(6) A valuation F is a finite set of literals to be viewed as their conjunction. 

(7) A valuation F is consistent if V/ (Z G F —>■ I ^ F). We let Cons denote the set of all 
consistent valuations. 

(8) A model is a total function M which maps literal^ to booleans and satisfies the property 

yi{M l^ I). 

We shall use the abbreviations 

• M ^ F, for V/ G F (M 1) {‘M is a model of F’), 

• M \= A, for VC G A 3/ G C (M 1) (‘M is a model of A’). 

We call a valuation F and a formula A compatible if there exists a model satisfying both, 
i.e. 3M (M 1= F A M ^ A); otherwise F and A are called incompatible. 

A sequent F h A is a pair consisting of a valuation and a formula. The intended 
meaning of a sequent F h A is that F and A are incompatible. As a special case, when 
F is empty, h A means that A is unsatisfiable. In the following we use the notations 
A, a := {x I X G A V X = a} and A\a:={x|xGAAx 7 ^ a}. 

Definition 2.2 (DPLL Proof System). The DPLL proof system consists of five rules: 


T,lh A 
FhA,{0 


(Unit) 


F,1 h A,C 
r,lh A,(C,7) 


(Red) 


F, / h A 
F,/ h A,(C,/) 


(Elim) 


F h A,0 


(Conflict) 


T,lhA r,lhA 

F h A 


(Split) 


Several variants of the DPLL proof system have featured in the literature. The above 
definition is closest to the Coq formalisation [26], other formalisations such as |28] and |19] 
combine the Unit, Red and Elim rules to form a single rule called the ” 1-literal rule” or 
’’unit propagation”. 


3. Soundness and Completeness 

3.1. Soundness and Completeness of DPLL. In this section we sketch the formal proof 
of soundness and completeness of the DPLL proof system. We will be very brief with the 
Soundness Theorem since its proof does not carry computational content and a similar proof 
is carried out in |26[l28j . On the other hand, we will describe the proof of the Completeness 
Theorem in some detail since we extract our SAT solver from it. 

We hrst reformulate the DPLL proof system as an inductive deflnition that can be 
immediately formalized in the Minlog system. The definition has a clause for each rule. We 
notationally identify a sequent FLA with the statement ‘F h A is derivable’. 

^We map literals instead of variables as a model is constructed from a set of literals in the form of a 
valuation. 
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Remark 3.1. The proof system described in Definition 12.21 has been reformulated for 
our theorem prover. The set of sequents T h A is defined inductively by the following 
(universally quantified) inductive clauses: 

Conflict 0 € A —)■ T h A 

Unit {/}€ A^r,/hA\{/}^rhA 

Elim /Gr^/eC'^C€A^rhA\C'^rhA 

Red I eT eC ^ C e {A\C),{C\l) A 

Split r,/hA^ TjhA^ThA 

Theorem 3.2 (Soundness). IfT\-A, then T and A are incompatible. 

The proof proceeds by structural induction on the given derivation of the sequent T h A. 
We omit further details. 

We now turn our attention to the Completeness Theorem for the DPLL proof system. 
The expected statement of completeness is: 

Vr G Cons, VA (incompatible(r, A) ^ T h A). 

A constructive proof of this statement would yield a program that computes a DPLL 
proof for incompatible P, A. We reformulate the statement by replacing the implication 
‘incompatible(r, A) —> P h A’ with the classically equivalent but constructively stronger 
disjunction ‘compatible(r, A) V P h A’. In this way, we obtain an enhanced program that 
still computes a DPLL proof for incompatible P, A, but in addition produces a model if P 
and A are compatible. 

Theorem 3.3 (Completeness of DPLL). 

VP G Cons, VA (compatible(r. A) V P h A) 

Proof. We aim to perform the proof in such a way that an efficient program is extracted. 
Therefore, we adopt the following strategy: 

(1) Since performing a Split rule is the only computational expensive operation - it is the 
only rule forcing the proof search to branch - we only apply it if absolutely necessary. 

(2) We perform an optimization on the proof level by partitioning the clauses into ‘clean’ 
and ‘unclean’ clauses, where a clause is called clean if we cannot apply Elim, Red 
or Unit to that clause. This increases the efficiency of the algorithm by reducing the 
number of comparisons needed. 

To this end we show that for all valuations T, and formulae A, 0, 

0 ^ 0 A r G Cons A Var(r) n Var(0) = 0 ^ 

(P h A U 0) V 3M{M ^PAM^AU0). 

The proof is by main induction on the measure 

/i(P; A; 0) := |(A U 0) \\Var(P)| + #(A) + #(0) 

where 

|A| := the cardinality of A 

A \\I/ := {l\3C G A{1 G C) A Var(0 ^ V} 

#(^) •= ilceA 1^1 
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and a side induction on |A| (i.e. the number of clauses in A). 

Let r, A, 0 be given such that 0 ^ 0, L G Cons, and Var(r) D Var(0) = 0. 

Case f A = 0. 

Case 1.1 0 = 0. 

We dehne a model M by M{1) = Time -H- / G T. Then M ^ T A M |= 0 holds. 

Case 1.2 0/0. 

Let (7 be a clause in 0 and let ^ G C (C / 0, by the assumption on 0). Then /u((r, 1); 0; 0) < 
//(r;0;0) since |0\\Var(r,/)| < |0\\Var(r)| and/(0) + /(0) = #(0)+#(0). Furthermore, 
for the values (T,/), 0, 0 the hypotheses of the theorem are clearly satisfied. Hence the 
induction hypothesis for these values yields 

(r,/h0)v3M(M/r,ZAM/0) (3.1) 

Similarly, we can apply the induction hypothesis to (r,7), 0, and 0 yielding 

(r,7h0)v3M(M/r,lAM/0) (3.2) 

The disjunctions (13.1|) and (13.2p result in 4 cases: In the case that r,^ h 0 and T,! h 0 
hold the Split rule is applied and we obtain T h 0. In all other cases we use one of the 
models obtained from the induction hypotheses. 

Case 2 A = A', C. 

We perform a case distinction on whether the valuation T has a literal in common with C. 
Case 2.irnC = 0. 

We perform a further case distinction on the cardinality of the clause C. 

Case 2.1.1 C = ^. 

It suffices to show T h (A',0) U 0. This follows from the Conflict rule. 

Case 2.1.2 C = {^}. 

If 7 G r, then T h (A', {/}) U 0 can be derived by applying (in backwards fashion) the 
Red rule followed by the Conflict rule. If 7 ^ T, then we use the induction hypothesis 
with (r, 1), A' U 0, 0. This is possible since /r((r, /); A' U 0; 0) < //(T; (A', {1}); 0) because 
|(A'U0)\\Var(r,/)| < |(A'U({0, 0))\\Var(r)| and #(A'U0) < #(A', {/})+#(0). Since for 
the values (T, 1), A' U 0, 0 the hypotheses of the theorem are satisfied (i.p. T, I is consistent 
since 7 ^ T), we obtain the disjunction (T, / h A' U 0) V 3M(M / T, / A M |= (A' U 0)). In 
the case that T, ^ h A' U 0 holds we apply the Unit rule resulting in T h A U 0. In the 
other case we have a model of T, ^ and A' U 0 which clearly also models T and A U 0. 

Case 2.1.3 \C\ >2. 

We perform a case distinction on (/ G C A 7 G T) V -i3/(/ G C A 7 G T). This disjunction 
can be proven constructively, since the sets involved are finite. 

Case 2.1.3.1 7 G T for some I G C. 

Then we have //(T; (A', C \ ^); 0) < /x(r; (A', C); 0) since #{A' ,C \ 1) < /(A',^^) and 
|(A', C'\/)'\Var(r)| = |(A', C')'\Var(r)| . The hypotheses of the theorem are satisfied for the 
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chosen values. Hence we obtain, by induction hypothesis, (F h (A', (C'\/)) U 0) V 3M(M |= 
r A M ^ (A', (C \ 1)) U 0). In the case that F h (A', {C \ l)) UQ holds, we apply the Red 
rule. In the other case we have a model of F and (A', (C \ /)) U 0 which also models the 
weaker formula (A', C) U 0. 

Case 2.1. 3. 2 eC Al £ F). 

In this case we may move C from A to 0: Since fi(T ; A'; (0, C)) < fi(T ; (A', C); 0) we can 
apply the side induction hypothesis to F, A', (0, C). Since for these values the hypotheses of 
the theorem are satisfied we obtain F h A' U (0, C) V 3M (M |= F A M ^ A' U (0, C)) which 
is the same as the required disjunction F h (A', C) U 0 V 3M{M |= F A M ^ (A', C) U 0). 

Case 2.2 F D C / 0. 

We can prove constructively that in this case F and C have some literal I in common. We 
apply the induction hypothesis to F, (A', {C \ /)), 0. Since clearly the measure decreases, 
#(A', {C\l)) < #(A', C) and |(A', (C'\/))\\Var(F)| = |(A', C')^Var(F)|, and the hypotheses 
of the theorem are satisfied, we obtain F h (A', (C \ /)) U 0 or 3M{M ^ F A M |= 
(A', {C \ /)) U 0). In the first case we apply the Elim rule, in the second case we use the 
model provided. Q 

3.2. Resolution. The resolution proof system |39j is widely used in practical applications, 
for instance in tools for proof checking and debugging [U] or interchange between differ¬ 
ent solvers |22] . State-of-the-art SAT solvers such as MiniSAT |18j and zChaff [33] return 
(extended) resolution proofs for unsatishable problems. By formalizing that every DPLL 
derivation has an equivalent resolution derivation, and combining this result with the com¬ 
pleteness proof from the previous section, we can extract a SAT solver which produces 
resolution derivations. The equivalence of DPLL and resolution was first shown by Robin¬ 
son |40] who translated between the two proof systems using semantic trees. 

By enriching the systems with size information we are able to show that the size of the 
resulting resolution proof does not exceed the size of the original DPLL proof. 

For every valuation F we define a clause F representing its negation by {Zi,..., Ik} = 

{Zi,..., Zfc}. 

n 

Definition 3.4 (Resolution Proof System). The derivable resolution sequents F h C with 

Res 

a derivation of size n are conveniently defined by two rules; subsumption (or axiom) and 
resolution. 


-^-(Sub) CCC 

A,C F C' 

Res 


n Tfi _ 

A h CVZ A h C'VZ" 

Res Res 

n+m+l 

A h cyC 

Res 


(Res) 


We also need a version of the DPLL proof system with added bounds in order to speak 
about the sizes of the proofs. 

n 

Remark 3.5 (Derivable refined DPLL sequents). The derivable DPLL sequents F h A 
with a derivation of size n are inductively defined by the following clauses: 
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Conflict 

Unit 

Elim 

Red 

Split 


0 e A - 
{ 1 } G A 
ler^iGC 


0 

r h A 

DPLL 


r,i h A\{/} 

DPLL 


/ G r ^ / G c 


C G A 
C G A 


n+1 

r h A 

DPLL 

n 

h A\C 


n+1 

r h A 

DPLL 


n _ 

TJ h A^ rj h A 

DPLL DPLL 


DPLL 
n 

r h (A\c),(c\I) 

DPLL ^ ^ ^ ^ 

n+m+1 

-+r h A 

DPLL 


n+1 

r h A 

DPLL 


Remark 3.6. The resolution proof system from Definition 13.41 has been reformulated as 

n 

follows for our theorem prover. The derivable resolution sequents T h C with a derivation 

Res 

of size n are inductively defined by the following clauses: 

Sub Co G A ^ Co C C ^ T h C 


Res 


Res A h (C'VO 

Res 


m 

A h (CVO 

Res 


n+m+1 

Ah (C V C') 

Res 


Theorem 3.7 (DPLL implies Resolution). For all consistent valuations T, CNF formulae 

n 'Tfi _ 

A and natural numbers n: If T h A, then A h T for some m <n. 

DPLL Res 

Proof. The proof is an easy induction on DPLL derivations. We only sketch the overall 
idea. The Conflict and Split rule translate into the Sub and Res rule respectively. Both 
of these rules have the same cost to perform them as the DPLL rules and so the size of the 
derivations are less or equal. An application of the Unit rule is a special case of the Res 
rule in which one of the branches is obtained via a subsumption of a unit clause. The size 
of these two proofs is less or equal since the cost of performing the Sub rule and Res rule 
together is the same as that of the Unit rule. Finally, both the Elim and Red DPLL rules 
correspond to a form of weakening in the resolution proof which is done at no cost because 
the resulting resolution proofs are smaller in size than the DPLL proofs. D 

Remark 3.8. One can also easily prove that resolution implies DPLL, more precisely, if 
AFC, then C h A. However, as long as the sizes of derivations are measured only in 

Res DPLL 

terms of the number of applications of rules (as we do above), no size bound can be given. 
The reason is that the translation of one instance of the subsumption rule 

-^- (Sub) C <ZC' 

A,C h C' 

Res 

into DPLL requires n applications of the Red rule where n is the number of literals in C. 


The Completeness Theorem for DPLL (Theorem 13.3p . adapted to the DPLL system 
with size information, and Theorem 13.71 (a) immediately imply: 

Theorem 3.9 (Completeness of the Resolution Proof System). 

n 

VA((3MM ^ A) V (3nA h 0)) 

Res 
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The program extracted from Theorem Id .71 translates DPLL derivations into equivalent 
resolution derivations. This translator and the SAT solver extracted from the Completeness 
Theorem for DPLL fTheorem l3.3p are combined in the program extracted from Theorem 13.91 
to a SAT solver that yields resolution refutations for unsatisfiable formulae. Since the 
computationally hard and interesting part of this program is entirely contained in the DPLL- 
based SAT solver, we will restrict our attention to the latter when we discuss the extracted 
programs in detail in Sect. [5l 


4. Program Extraction 

4.1. Theory. Program extraction in Minlog is based on modified realizability [23]. We 
highlight a few aspects that are important to understand the optimizations we achieved. 
For a complete and precise description of program extraction we refer to |42] . A formula is 
said to have computational content if it has at least one occurrence of 3 or V at a strictly 
positive position. To every such formula A one assigns a type t{A) of ’potential realizers’. 
If the formula has no computational content, one sets t{A) = e. From a proof of a formula 
A with computational content one can extract a program M of type t{A) that realizes A 
(written Mr A), that is, M solves the computational problem expressed by A. In order 
to fine-tune the computational content, in particular to remove redundant content, Minlog 
offers, besides the usual quantifiers V and 3, the non-computational (nc) quantifiers Vnc 
and 3nc (which roughly correspond to quantification in Prop in Coq). These have the same 
logical meaning as the nsual quantifiers, but indicate that the extracted program does not 
operate on the quantified variable, only on its realizer. The definitions of the type and the 
realizability relations for the ordinary universal quantiher contrasted with its nc version are: 

r(Vx^A) = p —)> r(A) frMx'^A = M{f {x) r A) 

r(Vnc3:^A) = t{A) arVncaJ^A = 'ix^{arA) 

Similarly for the two versions of the existential quantifier: 

t(3x^A) = px t{A) (a,y)r3x^A = arA[y/x] 

t{3^cX^A) = t{A) ar3^cX^A = 3x^(ar A) 

One sees that for the nc-quantifiers the realizers do not depend on the quantified variables. 
The program extraction procedure respects the different kind of quantihers by omitting 
in the nc case any information corresponding to the quantihed variable. The proof rules 
for the nc-quantifiers are subject to stricter variable conditions ensuring that the omitted 
information is indeed not needed in the extracted program. Minlog is able to automatically 
detect the maximal set of occurrences of quantifiers in a proof that can be made non- 
computational without compromising the correctness of the proof |38] . This holds for the 
logical parts of the proof only. In the formalization of inductive definitions one has to 
manually place V^c quantifiers. 
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4.2. Extraction to Haskell. The programs extracted by Minlog are terms in Minlog’s 
internal term language. This has the advantage that extracted programs can be reused for 
further proofs, and properties of the programs can be formally proven, again inside Minlog. 
Furthermore, the extracted programs are provably correct, and a (soundness) proof of this 
fact is automatically generated by Minlog. However, there are also inherent disadvantages: 
the interoperability of the extracted programs with external libraries or devices is limited, 
and executing the programs is sometimes slow. For both these reasons, it makes sense 
to translate the extracted programs into more conventional, general-purpose programming 
languages. Minlog implements a translation to Haskell (and also a limited translation to 
Scheme). Extracting to a lazy language such as Haskell makes the treatment of coinduction 
and corecursion (which is not used in our example) particularly simple [32]. 

There is a close fit between Haskell and the Minlog term language, and the translation is 
quite straightforward; basic terms such as variables, lambda abstractions, etc are translated 
to the corresponding Haskell terms. Standard algebras such as e.g. lists, integers, booleans, 
sum and product types are translated to their implementation in the Haskell Prelude, while 
user-defined algebras in general are translated to algebraic data types. Natural numbers are 
translated to (unbounded) integers for efficiencyO Program constants and their computation 
rules in Minlog correspond to functions defined by pattern matching in Haskell. Some 
care must be taken for e.g. the natural numbers; in Minlog, pattern matching on natural 
numbers is possible, but natural numbers are translated to integers, for which no pattern 
matching is available in Haskell. Instead guard conditions have to be used. Recursion 
operators, realizing structural induction, are automatically generated as Haskell functions 
by the translation. Minlog also supports general recursion along a decreasing measure, 
which makes sure that the program terminates. The Minlog implementation of the general 
recursion operator ensures that recursive calls are only made on arguments that are smaller 
than the current argument with respect to the measure: 

gRec : (/? ^ N) — p — (yO — 7> (yO — 7> r) — >■ r) ^ r 

gRec(/r, X, /) = f(x, {Xy . if fi{y) < y,{x) then gRec(/r, y, /) else inhab^)) 

(inhabr is a canonical inhabitant of r, justihed by the fact that all domains are inhabited in 
the intended, standard semantics). Note that the (potentially expensive) test y.{y) < fi{x) is 
computationally unnecessary, since at runtime we already know that our extracted program 
will only use recursive calls on smaller arguments. However, this test is needed because of 
Minlog’s eager evaluation strategy. Omitting the test: 

gRec(x, /) = f{x, {Xy . gRec(y, /))) (4.1) 

would make Minlog get stuck in an endless loop, forever evaluating the recursive call 
gRec(y, /) regardless of whether it is going to be used or not. 

However, since Haskell is a lazy language, we can safely implement general recursion 
using (jUD. This can give large efficiency gains in certain situations (see Section iGTl) . In a 
lazy setting, soundness of this variant of the program extraction process can still be proven, 
and the Haskell translation supports this optimization. However, there is now a discrepancy 
between Minlog programs and their Haskell translations: if called in a way that does not 
respect the measure, the Minlog implementation of gRec will halt with an arbitrary value, 

^Using bounded Ints instead of unbounded Integers would of course not be sound. In the cases where it 
would be safe to do so, it would also not result in any particular performance gains, since GHC stores small 
Integers as Ints. 
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while the Haskell version will diverge. For this reason, the optimization can be turned on 
and off with a switch, if identical behavior is important. Of course, every extracted term 
will respect the measure. 


5. The Extracted Program 

The size of the DPLL formalization is approximately 5500 lines of Minlog code. The ex¬ 
tracted program comes to 300 lines of code as a Minlog term and 600 lines of Haskell code. 
In the following we present two versions of our extracted solver: one optimized with Vnc 
quantifiers which we shall refer to as the Vnc solver, and the other without these optimiza¬ 
tions which we shall refer to as the V solver. 

The V solver takes a CNF formula A represented as a list of clauses as input, and 
produces either a model of A or a derivation of its unsatisfiability. Models are represented 
as functions from literals to booleans. An algebraic data type for DPLL derivations is 
automatically generated from its inductive definition in Minlog. It has five constructors, 
one for each of the DPLL rules in Definition 13.II 
data Algdpll = CConflict Valu For 

I CElim Valu For Cla Lit Algdpll 
I CUnit Valu For Lit Algdpll 
I CRed Valu For Cla Lit Algdpll 
I CSplit Valu For Lit Algdpll Algdpll 
deriving (Show, Read, Eq, Ord) 

Each constructor takes a formula and a valuation as arguments. The formula itself never 
changes during the proof and is only part of the algebra for the purpose of proving cor¬ 
rectness and does not play a role in any computation. While the valuation changes during 
the proof search, these changes can be captured by indicating which literal was added by 
the Unit and Split rules, thus making the valuation redundant as well. We added nc- 
quantifiers to the definition by hand in order to remove redundant computational content, 
resulting in 

data Algdpll = CConflict 

I CElim Cla Lit Algdpll 
I CUnit Lit Algdpll 
I CRed Cla Lit Algdpll 
I CSplit Lit Algdpll Algdpll 
deriving (Show, Read, Eq, Drd) 

The control structure of the program closely follows the structure of the case distinctions 
and proofs by induction performed in the proof. Lemmas invoked during the proof are 
extracted separately and called as procedures. Since the proof is by general induction along 
a measure, the main body of the program is using general recursion along the same measure. 

6. Execution of the Extracted Program 

In the following we will see how both V and Vnc solvers behave when they are applied to a 
number of SAT problems. The extracted decision procedure was run on several instances 
of the pigeon hole principle m in both Minlog and as Haskell programs. The pigeon hole 
principle states that there is no injective function that maps {1, 2 ... , n} to {1,2,... , n —1}. 
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Definition 6.1 (Pigeon Hole Formula). PHP(n,m) := , 4,m}|l < i < n} U 

{{k,k, lj,k}\^ < i < j < n,l < k <m} 

Here represents the statement “pigeon i sits in hole /c”. The whole formula 
PHP (n, m) states that n pigeons sit in m holes such that no two pigeons are in the same 
hole. Hence, PHP(n,m) is satisfiable iff n < m. For example, if we run our DPLL solver 
with the formula PHP(2,1) = {{/n}, {/ 21 }) {^iij ^ 21 }}, the following derivation is produced: 


^11,^21 h 


^21 h {^21} 


Conflict 
Red 
Red 


Unit 


^11; ^21 h {^11; ^21} 

^11 h {^21}; {^11, ^21} , 

_ _ — Unit 

^ {^11}) {^21}, {^11,^21} 


The following is the Minlog output for the pigeon hole formulae PHP(2,1). There is 
a constructor CsuccessZero of the algebra success which represents the disjunction in 
the main proof statement. The data type extracted from this algebra can be seen as a 
union type that contains either a DPLL derivation or a model of the formula. In this 
case it contains a DPLL derivation showing that the formula is unsatisfiable. The argu¬ 
ments to CsuccessZero store how the Conflict is derived. The literal In is represented as 
(Pos (Variable 11)) in the Minlog formalization and the clause {^n, ^ 21 } is represented as 
CC(Neg(Variable 21):: (Neg(Variable 11)):). 

CsuccessZero 

(CUnit 

(Pos(Variable 11)) 

(CUnit 

(Pos(Variable 21)) 

(CRed 

(CC(Neg(Variable 21)::(Neg(Variable 11)):)) 

(Pos(Variable 21)) 

(CRed 

(CC(Neg(Variable 11)):) 

(Pos(Variable 11)) 

CConflict)))) 

Running the DPLL solver on a satisfiable formula results in a function which maps 
literals to booleans. For example running the solver with PHP(2, 2) results in the function 
M : literals —>• B where M(() = True iff I € {(i 2 ) ^ 2 i) ^ 22 }- The Minlog output for the 

satisfiable formula PHP(2,2) is as follows. Here the square brackets represent a lambda 
abstraction for the literal Iq. The model M is written as XIq.Iq G {( 12 ,^ 11 ,^ 21 )^ 22 }- 
CsuccessOne 
([ 10 ] 

[if (10=Pos(Variable 12)) 

True 

[if (10=Neg(Variable 11)) 

True 
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Table 1: Performance in Minlog versus Haskell 


Formula 

Minlog V 

Witness 

Minlog V„c 

Witness 

Compiled (ghc -02) 

Compiled (ghc 

-02 -fllvm) 

Witness 

Yes/No 

Witness 

Yes/No 

PHP(4,3) 

33.62s 

11.61s 

0.019s 

0.006s 

0.015s 

0.004s 

PHP(4,4) 

5.45s 

5.25s 

0.019s 

0.010s 

0.014s 

0.007s 

PHP(5,4) 

13m54s 

2m41s 

0.055s 

0.020s 

0.036s 

0.012s 

PHP(5,5) 

26.09s 

25.03s 

0.024s 

0.015s 

0.020s 

0.010s 

PHP(6,5) 

5h35m41s 

37m25s 

0.367s 

0.066s 

0.279s 

0.039s 

PHP(6,6) 

lm34.11s 

lm24.88s 

0.035s 

0.025 

0.025s 

0.015s 

PHP(8,8) 

- 

- 

0.054s 

0.029s 

0.040s 

0.025s 

PHP(9,8) 

- 

- 

- 

lm21.915s 

- 

32.062s 

PHP(9,9) 

- 

- 

0.064s 

0.042s 

0.052s 

0.030s 

PHP(10,9) 

- 

- 

- 

102m 16s 

- 

15m 5s 


Table 2: Performance compared to Versat 


Formula 

Vnc compiled (Yes/No) 

Versat 

PHP(7,6) 

0.226s 

0.089s 

PHP(8,7) 

2.42s 

0.794s 

PHP(9,8) 

32.062s 

17.217s 

PHP(10,9) 

15m 5s 

15m 46s 


[if (10=Pos(Variable 21)) 

True 

(10=Neg(Variable 22))]]]) 

6.1. Comparison of Program Performance. The V solver and Vnc solver were compared 
using both unsatishable PHP(n + l,n) and satisfiable PHP(n,n) pigeon hole formulae. 
The unsatishable pigeon hole formulae are harder than the satishable formulae as they have 
a large search space that must be traversed entirely by the solver in order to construct a 
derivation. This difficulty can be seen - compare column 2 and 3 in Tabled]- when both the 
V and Vnc solver are applied to the unsatishable pigeon hole formulae. The solver without 
the optimization takes considerably longer to construct a derivation of unsatishability. This 
is due to computationally irrelevant data being stored in the unoptimized derivations. 

The next two columns of Table d] present two versions of the Vnc solver when extracted to 
Haskell and compiled by the Glasgow Haskell Compiler (GHC). The hrst returns a witness of 
the result i.e. either a model which satishes the formula or a derivation of its unsatishability. 
The second returns only a Yes or No answer as to whether a formula is satishable or not. 
Due to the inherent laziness of Haskell the two programs differ quite dramatically in their 
behavior. The solver that returns a Yes/No answer performs considerably faster compared 
to the solver which produces the witness in addition. By using the Low Level Virtual 
Machine (LLVM) backend |24j for GHC, a further speed up was achieved, which can be 
seen in the last two columns of Table dl 
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Table 3: Industrial case study: Extracted solver versus Versat 
Formula Vnc compiled (Yes/No) Versat 


'R\ V “iRS 

7.028s 

0.050s 

'R\ V “ii?4 

6.961s 

0.040s 

R2 V “iRS 

7.105s 

0.053s 

R2 V “ii?4 

7.059s 

0.044s 

Ri V -ii?4 

7.015s 

0.047s 


We also compared the performance of our Vnc solver, compiled using the LLVM backend 
of GHC, with that of Versat [36]. Our solver was run with the option of not computing a 
witness since Versat does generally not compute a proof. The results in Table [2] show that 
our solver is comparable with Versat. It is slower on the easier formulae and faster on the 
hardest pigeon hole formulae. This is because the clause learning optimization of Versat 
has some overhead and does not increase the performance on pigeon hole formulae. The 
point of the learned clauses is to reduce the search space for the solver. In this case, they 
instead consume more memory and time to compute. 

6.2. Industrial Case Study. The same version of our solver was also applied to the 
verification of a real world railway control system which was provided by our industrial 
partner Invensys Rail (now Siemens), via a description in Ladder logic. We adapted [21] 
to translate Ladder logic programs into Minlog/Haskell and the industrial tool SCADE [T], 
and also performed a comparison with Versat. The SAT problem is formulated to perform 
falsification checking, as described in |l3], that is, a satisfying assignment represents a 
counter example, and an unsatisfiable result means the safety property can not be violated 
in the system. The size of our case study is 14726 clauses and 8166 variables. For comparison, 
we present the run-times for checking five safety conditions which show that two conflicting 
routes, out of a set of four routes Rl,..., ii4, can not be active in the railway at the same 
time. For each of the five conditions our solver produces a proof certifying that the safety 
property holds in approximately 7s. The SCADE suite can verify that each of the safety 
properties holds in less than one second (no greater accuracy of run-times provided by the 
system for this case). 

While we cannot expect to compete with an industrial tool on speed and functionality, 
we have been able to solve a large practical problem in a reasonable amount of time. It is 
important to note that the solver inside the SCADE suite has not been formally verified 
whereas our solver has. Interestingly however, also Versat solves these problems in less 
than one second - see Table [3] for a comparison between our extracted solver and Versat - 
that is, we may conclude that optimizations such as clause-learning and the use of efficient 
data structures that enable to efficiently parse and identify (un-)satisfiability of a formula 
indeed improve the performance for this type of problems (and our extracted solver should 
be extended by these optimizations as well). 

7. Conclusion 

We have presented a conceptually new approach to the synthesis and verification of SAT 
algorithms that, in contrast to similar work in Coq and Isabelle [26(128] does not require the 
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formalization of the SAT programs in the formal system, but obtains SAT algorithms purely 
by program extraction. To this end, we formalized the DPLL proof system and performed a 
constructive proof from which a correct SAT solving algorithm was extracted automatically. 
The extracted program attempts to show the (un)satishability of a propositional formula 
in conjunctive normal form. If the CNF formula is satishable it produces a model of the 
formula; otherwise it produces a derivation showing unsatishability. We strategically placed 
Vnc quantihers into the proof to reduce the complexity of the extracted program and increase 
its performance. The solver containing Vnc quantihers was extracted into the functional 
programming language Haskell, and the performance of the two solvers was evaluated using 
pigeon hole formulae. We have also shown how it is possible to extract a program that 
translates between DPLL and resolution proofs. This was done in such a way that we 
obtain some qualitative information about quantitative aspects of the extracted program 
i.e. computational complexity. Using this translation it was possible to extract a resolution 
solver based on the DPLL proof system. 

Overall, our paper shows that the approach of developing verihed programs via extrac¬ 
tion from proofs is scalable to non-trivial applications. Furthermore, it demonstrates how 
to include efficiency considerations into this approach. For instance, we have avoided re¬ 
peated unnecessary look-ups of clauses by the split of clause sets in two sets A and 0. This 
counters the often heard argument that with program extraction one ’loses the grip’ on 
the program and its efficiency. It is important to note that these efficiency considerations 
do not compromise the correctness of the extracted program since these are applied at the 
proof level where correctness is guaranteed by the proof system. 

We consider the fact that our approach does not require any formalization of algorithms 
a major advantage, since it means that program development via extraction can be carried 
out in a formal system that is much more lightweight than in the verihcation approach, 
where the term language must include a programming language, and the meaning of the 
programming constructs must be specified by axioms and proof rules. This advantage is 
particularly striking in applications in analysis 015] where corecursive exact real number 
algorithms (whose formalization and specihcation is non-trivial and subject of ongoing re¬ 
search) can be automatically extracted from proofs involving only coinductive dehnitions 
in the form of largest hxed points of predicate transformers. 

7.1. Future Work. There are two directions for further work: applying our method to 
extract a more advanced class of SAT solvers, and applying our approach to a different 
class of decision problems. 

We are in the process of formalizing optimizations such as clause learning and conflict 
analysis [3 [Ml [29]. This requires a modification of the DPLL proof system such that it 
captures the additional behavior. A completeness theorem has then been proven for the 
modified calculus. We currently have extracted a prototype clause learning solver from this 
proof. In order for this solver to be an improvement on the previous one we need to lower 
the computational overhead resulting from clause learning. Such a solver would also beneht 
from lazy data structures such as the two-watched-literal scheme. It is unclear whether the 
inherent laziness of Haskell will provide the same effect as these data structures or if they 
would have to be formalized as part of the proof. 

It is desirable to be able to solve not just propositional formulae but also hrst-order 
formulae. This is possible by extending SAT algorithms so that they can apply some 
background theory for first order formulae. Such algorithms are called Satishability Modulo 
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Theories (SMT) solvers. We would have to formalize a proof system used by SMT solvers 
such as abstract DPLL |34] and then perform a completeness proof. A solver extracted 
from such a proof system would be able to solve a broader range of problems described in 
a language richer than propositional logic. 

7.2. Sources. The Minlog formalization optimized with V^c quantifiers and its extracted 
program as Haskell code can be found at http://cs.swan.ac.uk/minlog/dpll/, 
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